One sample test

We first create data. In particular we create a categorical vector with two categories (yes/no):

set.seed(123)
x <- rbinom(n = 60, size = 1, prob = 0.3)

Null hypothesis: the probability of x being yes is equal to 0.4. We have a large sample size so we can use the \(z\)-test for proportions. In R we can use the prop.test() function for a proportion test. The data are provided as the number of successes x and the number of trials n:

prop.test(x = sum(x), n = length(x), p = 0.4)
## 
##  1-sample proportions test with continuity correction
## 
## data:  sum(x) out of length(x), null probability 0.4
## X-squared = 2.1007, df = 1, p-value = 0.1472
## alternative hypothesis: true p is not equal to 0.4
## 95 percent confidence interval:
##  0.1920041 0.4337324
## sample estimates:
##   p 
## 0.3

Alternatively we can create a matrix with the successes and failures:

mat <- matrix(data = c(sum(x), sum(1-x)), nrow = 1, ncol = 2)

The following code provides us with the same results:

prop.test(mat, p = 0.4)
## 
##  1-sample proportions test with continuity correction
## 
## data:  mat, null probability 0.4
## X-squared = 2.1007, df = 1, p-value = 0.1472
## alternative hypothesis: true p is not equal to 0.4
## 95 percent confidence interval:
##  0.1920041 0.4337324
## sample estimates:
##   p 
## 0.3

Note that, by default, the function prop.test() uses the Yates continuity correction. To not use this correction we can set the argument correct = FALSE:

prop.test(x = sum(x), n = length(x), p = 0.4, correct = FALSE)
## 
##  1-sample proportions test without continuity correction
## 
## data:  sum(x) out of length(x), null probability 0.4
## X-squared = 2.5, df = 1, p-value = 0.1138
## alternative hypothesis: true p is not equal to 0.4
## 95 percent confidence interval:
##  0.1989817 0.4250871
## sample estimates:
##   p 
## 0.3

If we want to test whether the proportion is less than 0.4 (one-tailed test), we can change the argument alternative:

prop.test(sum(x), n = length(x), p = 0.4, correct = FALSE,
          alternative = "less")
## 
##  1-sample proportions test without continuity correction
## 
## data:  sum(x) out of length(x), null probability 0.4
## X-squared = 2.5, df = 1, p-value = 0.05692
## alternative hypothesis: true p is less than 0.4
## 95 percent confidence interval:
##  0.0000000 0.4042081
## sample estimates:
##   p 
## 0.3

Or, if we want to test whether the proportion is greater than 0.4 (one-tailed test), use this:

prop.test(sum(x), n = length(x), p = 0.4, correct = FALSE,
          alternative = "greater")
## 
##  1-sample proportions test without continuity correction
## 
## data:  sum(x) out of length(x), null probability 0.4
## X-squared = 2.5, df = 1, p-value = 0.9431
## alternative hypothesis: true p is greater than 0.4
## 95 percent confidence interval:
##  0.2130506 1.0000000
## sample estimates:
##   p 
## 0.3

Using the formula \(z = \frac{p-\pi_0}{\sqrt{\frac{\pi_0(1-\pi_0)}{n}}}\) we can calculate the test statistic:

test_res <- (sum(x)/length(x) - 0.4) / (sqrt((0.4*0.6)/(length(x))))

The p-value for a two-tailed test will then be:

pVal <- 2 * pnorm(test_res, lower.tail = TRUE)
pVal
## [1] 0.1138463

Important note: The prop.test() function does not do a \(z\)-test. It does a Chi-square test, using one categorical variable with two states (success and failure).
Connection between \(\chi^2\) and \(z\) distribution
If \(X\) is an independent random variable from the standard \(z\) distribution, then the random variable \(X^2\) follows the \(\chi^2\) distribution:

test_res^2 
## [1] 2.5
prop.test(x = sum(x), n = length(x), p = 0.4, correct = FALSE)$statistic
## X-squared 
##       2.5

Let’s now assume that we only have 20 subjects (small sample size). We first create the data:

set.seed(123)
x <- rbinom(n = 20, size = 1, prob = 0.3)

Null hypothesis: the probability of x being yes is equal to 0.4. We have a small sample size so we can use the exact binomial test:

binom.test(x = sum(x), n = length(x), p = 0.4)
## 
##  Exact binomial test
## 
## data:  sum(x) and length(x)
## number of successes = 7, number of trials = 20, p-value = 0.8203
## alternative hypothesis: true probability of success is not equal to 0.4
## 95 percent confidence interval:
##  0.1539092 0.5921885
## sample estimates:
## probability of success 
##                   0.35

The \(\chi^2\) goodness-of-fit test

We first create data. In particular we create a categorical vector with two categories:

set.seed(123)
x <- rbinom(n = 60, size = 1, prob = 0.3)

Null hypothesis: the two categories are chosen with equal probability

chisq.test(x = c(sum(x), sum(1-x)))
## 
##  Chi-squared test for given probabilities
## 
## data:  c(sum(x), sum(1 - x))
## X-squared = 9.6, df = 1, p-value = 0.001946
chisq.test(x = as.table(c(sum(x), sum(1-x)))) 
## 
##  Chi-squared test for given probabilities
## 
## data:  as.table(c(sum(x), sum(1 - x)))
## X-squared = 9.6, df = 1, p-value = 0.001946

Null hypothesis: the two categories have probabilities equal to 0.4 and 0.6 respectively:

chisq.test(x = c(sum(x), sum(1-x)), p = c(0.4, 0.6))
## 
##  Chi-squared test for given probabilities
## 
## data:  c(sum(x), sum(1 - x))
## X-squared = 2.5, df = 1, p-value = 0.1138

Two sample test

We first create the data:

set.seed(123)
x <- rbinom(n = 60, size = 1, prob = 0.3)
y <- rbinom(n = 60, size = 1, prob = 0.6)

Null hypothesis: the probability of successes in x is equal to the probability of successes of y. We have a large sample size so we can use the \(z\)-test for proportions:

prop.test(x = c(sum(x), sum(y)), n = c(length(x), length(y)))
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(sum(x), sum(y)) out of c(length(x), length(y))
## X-squared = 8.6511, df = 1, p-value = 0.003269
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.47031316 -0.09635351
## sample estimates:
##    prop 1    prop 2 
## 0.3000000 0.5833333

Test of independence for two variables

We first create the data:

set.seed(123)
x <- rbinom(n = 60, size = 1, prob = 0.5)
y <- rbinom(n = 60, size = 1, prob = 0.3)
mat <- table(x, y)
dimnames(mat) <- list(gender = c("F", "M"),
                      treatment = c("yes","no"))
mat
##       treatment
## gender yes no
##      F  25  6
##      M  17 12

Null hypothesis: there is no association between gender and treatment. We use the \(\chi^2\) test:

chisq.test(mat)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  mat
## X-squared = 2.4917, df = 1, p-value = 0.1145

Note that here have two categories per variable, but more categories can be assumed.

Let’s now assume that we only have 20 subjects (small sample size). We first create the data:

set.seed(123)
x <- rbinom(n = 20, size = 1, prob = 0.3)
y <- rbinom(n = 20, size = 1, prob = 0.6)
mat <- table(x, y)
dimnames(mat) <- list(gender = c("F", "M"),
                      treatment = c("yes","no"))
mat
##       treatment
## gender yes no
##      F   7  6
##      M   4  3

Null hypothesis: there is no association between gender and treatment. In this case we use the Fisher’s exact test:

fisher.test(mat)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  mat
## p-value = 1
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.0898094 7.8308064
## sample estimates:
## odds ratio 
##   0.880853